Character segmentation using visual interword constraints in a text page

نویسندگان

  • Tao Hong
  • Jonathan J. Hull
چکیده

Character segmentation is a critical preprocessing step for text recognition. In this paper a method is presented that utilizes visual inter-word constraints available in a text image to split word images into smaller image pieces. This method is applicable to machine-printed texts in which the same spacing is always used between identical pairs of characters. The visual inter-word constraints considered here include information about whether a word image is a sub-image of another word image. For example, given two word images A and B, which are "mathematical" and "the". If the short word image B is found to be a sub-image of the long word image A, the longer image A is split into three pieces, A 1 , A 2 and A 3 , where A 2 matches B, A 1 corresponds to "ma", and A 3 corresponds to "matical". The image piece A 1 can be further used to split A 3 into two parts, "ma" and "tical". This method is based purely on image processing using the visual context in a text page. No recognition is involved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

The role of interword spacing in reading Japanese: An eye movement study

The present study investigated the role of interword spacing in a naturally unspaced language, Japanese. Eye movements were registered of native Japanese readers reading pure Hiragana (syllabic) and mixed Kanji-Hiragana (ideographic and syllabic) text in spaced and unspaced conditions. Interword spacing facilitated both word identification and eye guidance when reading syllabic script, but not ...

متن کامل

Cherry Blossom: A System for Japanese Character Recognition

A general purpose Japanese character recognition system, Cherry Blossom, has been developed at CEDAR in past years. It is designed to recognize Japanese document images in low resolution or with poor print quality. The system includes modules for page skew correction, document segmentation, text segmentation, character recognition and postprocessing. The API code for each module has been develo...

متن کامل

Segmenting Text Images With Massively Parallel Machines

Image segmentation, the partitioning of an image into meaningful parts, is a major concern of any computer vision system. The meaningful parts of a text image are lines of text, words and characters. In this paper, the segmentation of pages of text into lines of text and lines of text into characters on a parallel machine will be examined. Using a parallel machine for text image segmentation al...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995